9 September 2015

Basic biology

Basic biology

Basic biology

Sodium bisulfite treatment of DNA

TODO: Image source and note hydroxy

Assays

Bisulfite-based assays

Going to focus on contemporary high-throughput assays based on sodium bisulfite treatment of DNA

Microarrays

  • Illumina 27k
  • Illumina 450k

Sequencing

  • Whole-genome bisulfite-sequencing (WGBS/BS-seq/methylC-seq)
  • (Enhanced/Extended) Reduced representation bisulfite-sequencing (eRRBS/RRBS)
  • Capture + bisulfite-sequencing (Roche SeqCap Epi system)

Non-bisulfite assays

Based on an enrichment/pulldown of methylated DNA and/or restriction enzymes

  • Methylated DNA immunoprecipitation + microarray/sequencing (MeDIP-microarray/MeDIP-seq/mDIP-seq)
  • Methylation-sensitive restriction enzyme + sequencing (MRE-seq)
  • Methyl binding domain protein-enrichment + sequencing (MBD-seq)

The analysis pipeline

Most assay-specific:

  • Getting data into R
  • Pre-processing
  • Analysis

Somewhere in between:

  • Batch effects

Less assay-specific:

  • Visualisation
  • Data integration

How can Bioconductor help?

Bioconductor 3.1 packages (based on DNAMethylation BiocViews):

  • 42 software packages
  • 6 annotation packages

Disclaimer

I can't tell you everything

  • 25 minutes
  • I don't know everything!

Will tell you:

  • What I find useful as a fairly well-experienced user & developer
  • What I am most familiar with
  • Where to find out more

Microarrays

Description of assays

What these measure

  • Red + green fluorescence intensities reflecting methylated and unmethylated signal

Illumina 27k

  • Infinium HumanMethylation27 BeadChip
  • ~28,000 cytosines
  • Mostly in promoters of ~15,000 genes
  • Infinium I probes
  • Deprecated? But much of TCGA data uses this platform.

Illumina 450k

  • Infinium HumanMethylation450 BeadChip
  • ~486,000 cytosines
  • Promoters, gene bodies, 3' UTR, intergenic
  • 135k Infinium I and 350k Infinium II probes
  • An overview of 450k technology

Key packages

Data ingest

File formats

  • .idat files
  • Files returned by Illumina's BeadStudio

minfi

  • read.450k(), read.450k.exp(), read.450k.sheet()
  • readTCGA()
  • readGEORawFile()

Pre-processing

Quality control

Pre-processing

Standard microarray issues

  • Failed probes
  • Cross-reactive probes
  • Background correction
  • Colour (dye) bias adjustment
  • Normalisation

450k-specific issues

  • Type I and II probes are very different
  • CpG-SNPs

Methylation-specific(ish) issues

  • Cell composition artefacts

Pre-processing

  • Several pre-processing options available in Bioconductor packages and some comparisons published e.g., wateRmelon

Tools

Downstream analyses

Differential methylation

Differentially methylated probes (DMPs)

  • For a given probe, are the group-average level(s) of methylation different?
  • limma is the workhorse, e.g., minfi::dmpFinder()
  • OPINION: You want a pretty good reason not to use a limma-based approach

Differentially methylated regions (DMRs)

  • Identify regions with different group-average level(s) of methylation
  • OPINION: DMR finding/testing is somewhat ad hoc, but getting better
  • E.g., see minfi::blockFinder() with the bumphunter::bumphunter() backend

Differential variability

  • For a given probe, are the within-group variances of methylation levels different?
  • missMethyl::varFit, missMethyl::topVar() based on limma

Sequencing

Description of assays

Description of assays

What these measure

Single-base resolution of cytosine methylation

Whole-genome bisulfite-sequencing

  • Gold standard
  • Genome-wide assay
  • Expensive
  • 2-30x sequencing coverage

Targeted bisulfite-sequencing

  • Reduce representation bisulfite-sequencing and SeqCap
  • Cheaper
  • 20-60x sequencing coverage

Key packages

Still being figured out …

OPINION

  • bsseq for whole-genome bisulfite-sequencing
  • BiSeq for reduced representation bisulfite-sequencing
  • RnBeads for comprehensive WGBS/RRBS pipeline
  • Other non-R/Bioconductor options TODO: Ref Mark's paper

Non-R preliminaries

Pipeline (including most pre-processing)

Input to Bioconductor

  • Some aligner-specific file format with the following data per-sample:
chr  pos M  U
chr7 666 13 2
chr7 685 12 0

Pre-processing

  • Most done prior to reading data into R
  • More that ought to probably be done

Some issues

  • CpGs overlapping genetic variants
  • Copy number variation
  • Normalisation (nothing much yet available)

Smoothing \(\beta\)-values

Cartoon (see e.g., bsseq::BSmooth() for a proper implementation)

Smoothing \(\beta\)-values

Cartoon (see e.g., bsseq::BSmooth() for a proper implementation)

Differentially methylated regions

Cartoon (see e.g., bsseq::BSmooth.tstat() and bsseq::dmrFinder() for a proper implementation)

Differentially methylated regions

Cartoon (see e.g., bsseq::BSmooth.tstat() and bsseq::dmrFinder() for a proper implementation)

Other downstream analyses

UP TO HERE

  • Using methylation patterns, e.g, epialleles

Visualisation

Yup

  • epiviz
  • lollipop plots

Data integration

Yup

  • AnnotationHub, biomaRt, etc.

Batch effects and technical biases

Yup

  • Cell composition bias

Summary

Links

TODOs

  • [ ] Include affiliation on title slide?
  • [ ] Add authors to key packages?
  • [ ] Check definitions of beta-values and M-values

Manual package curation (BioC 3.1)

TODO: Add links

Microarrays

  • BEclear (batch effects)
  • ChAMP
  • charm
  • COHCAP (+ BS-seq)
  • conumee
  • CopyNumber450k
  • DMRcate
  • DMRforPairs
  • ENmix
  • lumi
  • MethylAid
  • MethylMix
  • methylumi
  • minfi
  • missMethyl
  • shinyMethyl
  • skewr
  • wateRmelon
  • ANNOTATION

Sequencing

  • BiSeq
  • bsseq
  • DMRcaller
  • DSS
  • M3D
  • methylPipe
  • MethylSeekR
  • MPFE

Misc.

  • bumphunter (general)
  • coMET (visualisation of EWAS and 'co-methylation LD maps')
  • MassArray (sequenom)
  • MEDIPS (MeDIP-seq)
  • MEDME (MeDIP-microarray)
  • methVisual (clone BS-seq)
  • methyAnalysis (mostly arrays with some seq, same dev as lumi)
  • methylMnM (MeDIP-seq and MRE-seq)
  • Repitools (MeDIP-seq)
  • RnBeads (slick pipeline run with "modules", builds on other packages)